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ABSTRACT 


Autism Spectrum Disorder (ASD) manifests as a multifaceted neurodevelopmental condition marked by 
difficulties in social interaction, communication, and repetitive behaviors. This paper proposes novel 
techniques for the detection of ASD using a combination of conventional ML algorithms and advanced 
ensemble techniques. Leveraging three datasets sourced from the UCI Repository, representing distinct age 
groups—adults, adolescents, and children, innovative approaches are introduced to enhance ASD diagnosis. 
After the collection of data, data preprocessing is performed. Later, the top features in each dataset are 
analyzed, providing insights into the most discriminative features for ASD detection. Initially, conventional 
ML algorithms, including logistic regression, KNN, SVM, decision trees, random forests, AdaBoost, and 
gradient boosting, are applied to establish a baseline for comparison. Subsequently, the effectiveness of 
ensemble techniques, including Bagging Meta-learner (BMA), Stacked Generalization, Stacking Classifier, 
and Voting Classifier, in improving detection performance is explored. Experimental findings demonstrate 
that the proposed ensemble techniques consistently outperform individual models across all datasets. Later, 
a novel ensemble meta-features integration technique was introduced, combining predictions from 
individual ensemble models to enhance ASD detection performance achieving higher accuracy, precision, 
recall, and Fl-score. Finally, extended analysis was conducted to classify ASD cases into age-specific 
categories using ML models, achieving good results. Moreover, the techniques proposed in this research 
offer scalability and adaptability, suitable for implementation in diverse clinical settings. This research 
contributes to advancing ML-based approaches for ASD diagnosis, offering novel techniques that can 
potentially enhance clinical decision-making. 
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1. INTRODUCTION expertise to distinguish typical developmental 


variations from early signs of ASD. Clinical 


Autism Spectrum Disorder (ASD) is a 
complicated neurological disorder that affects 
social interaction, communication, and several 
activities. Diagnosing ASD is a multifaceted 
process that requires careful consideration of 
various factors, including age, developmental stage, 
and individual differences. The challenges 
associated with diagnosing ASD are further 
compounded by the variability in symptoms and 
presentation across different age groups. In adults, 
symptoms may be masked or camouflaged, making 
it challenging to differentiate ASD from other 
mental health conditions or personality traits. 
Adolescents, undergoing rapid developmental 
changes and social transitions, may exhibit 
fluctuating or evolving symptoms, complicating the 
diagnostic process. Similarly, diagnosing ASD in 
children requires specialized assessments and 


observations and standardized evaluations may 
miss subtle or uncommon symptoms, resulting in 
underdiagnosis or misinterpretation, especially in 
disadvantaged or underrepresented areas. 
Additionally, these evaluations are subjective, 
resulting in inconsistent diagnostic results. Recently 
developed ML and data analysis methods have 
improved ASD diagnosis. ML models may find 
patterns and correlations in huge clinical and 
behavioral datasets that conventional diagnostic 
methods cannot. In this paper, several ML 
techniques are applied for the detection of ASD in 
different age groups.In this paper, alongside 
addressing the complexities of ASD diagnosis, the 
utilization of ML techniques offers a promising 
avenue for enhancing diagnostic accuracy across 
diverse age groups, aligning with the burgeoning 
field of precision medicine. 


eee reece 
5619 


Journal of Theoretical and Applied Information Technology 
31% May 2024. Vol.102. No. 10 


© Little Lion Scientific 


ISSN: 1992-8645 


www.jatit.org 


SATIT 


E-ISSN: 1817-3195 


In [1], the authors proposed a scan path- 
based ASD detection approach based on dynamic 
gaze distribution changes. Two similarity metrics 
were used to compare feature space and gaze 
behavior patterns between ASD and_ usual 
development (TD), utilizing four sequence 
characteristics from scan paths. The gaze patterns 
of ASD children were more individualistic, with 
variances in attention duration and vertical spatial 
distribution. LSTM networks beat classical 
classification. However, while the study provided 
valuable insights into gaze behavior patterns, it 
primarily focused on a specific aspect of ASD 
detection and did not explore broader diagnostic 
techniques.[2] presented a systematic review of 
existing literature on the use of machine learning 
for ASD detection and proposed a diagnostic tool. 
The authors claimed that the utilization of transfer 
learning techniques improved the detection of ASD 
successfully. Although transfer learning is 
acknowledged as beneficial, the review could have 
provided more in-depth analysis and comparison of 
different transfer learning approaches for ASD 
detection.In [3], logistic regression, XGboost, SVC, 
and Naive Bayes were used to investigate ASD 
detection parameters using open-source datasets. 
The most effective was XGBoost. Analytical 
methods show that machine-learning may 
accurately predict autism spectrum disorder status 
when optimized. These results suggest that these 
models might diagnose ASD early, improving 
intervention chances. All datasets, including cross- 
validation, performed better with XGBoost.The 
study highlights the effectiveness of XGBoost in 
ASD detection, but a deeper exploration of the 
limitations and challenges of each algorithm could 
provide more comprehensive insights. In [4], the 
authors trained two ML classifiers, logistic 
regression and SVM, locally to classify ASD 
variables and diagnose ASD in children and adults 
using a FL approach. These classifiers’ outputs 
were sent to a central server where a meta classifier 
was trained to identify the best method for 
detecting ASD in children and adults due to FL. For 
feature extraction, four ASD patient datasets from 
various sources had more than records of afflicted 
children and adults.While the study introduces a 
federated learning approach for ASD detection, it 
could benefit from discussing potential privacy and 
data sharing concerns associated with this method. 

Machine learning was applied in [5] to 
increase diagnostic precision and time. Datasets 
were analyzed using SVM, Random Forest, Naive 
Bayes, Logistic Regression, and KNN models, 
resulting in predictive models. Results show 


logistic regression has the greatest accuracy for the 
chosen dataset. Convolutional neural networks and 
particle swarm optimization were used to diagnose 
ASD in [6]. Initial preparation tackles missing data. 
SVM, NB, LR, and PSO-CNN are evaluated on 
ASD screening datasets. PSO-CNN outperformed 
other approaches in accuracy, especially for 
missing data. [7] used KNN, Logistic Regression, 
Decision Trees, Random Forest, Naive Bayes, and 
XGB Classifier to detect ASD based on user input. 
A classification method was given in [8] to study 
functional brain connections utilizing the newly 
built database ABIDE II, which pooled multisite 
data from three locations. Several classification 
techniques were used, including SVM, LR, and RF. 
RF surpassed the other two strategies with an ideal 
classification accuracy of 75%, much higher than 
earlier efforts. [9] used four feature scaling (FS) 
techniques and eight machine learning algorithms 
to classify datasets. Statistical evaluations 
determined the optimum classification and FS 
approaches for four typical ASD datasets. In [10], 
MRI brain scans were used to identify autism 
conditions using a deep CNN with Dwarf 
Mongoose optimized residual network (DM- 
ResNet). Non-brain tissues were eliminated before 
segmentation utilizing hybrid Fuzzy C Means 
(FCM) and Gaussian Mixture Model. DM 
optimized ResNetclassified features collected by 
VGG-16 networks. 

[11] employed ML models to diagnose 
ADHD children with ASD using handwriting 
characteristics. Japanese children's handwriting was 
analyzed statistically. Analyzing these 
characteristics trained ADHD detection ML 
systems. The most accurate was the Random Forest 
classifier. This research shows handwriting patterns 
may distinguish ADHD, ASD, and healthy 
youngsters. [12] refined GEI with Joint Energy 
Image (JEI), which maintained just joint locations 
from video sequences. Prior to color mapping, 
depth was represented in binary pictures. JEI 
combined temporal and depth data into 2D. A CNN 
and machine learning models were preprocessed 
using Principal Component Analysis before JEI. 
CNN accuracy increased on the main and 
secondary datasets. In [13], an app was created to 
diagnose autistic and non-autistic children using 
ResNet-50 and Xception modules. The ResNet-50 
approach outperformed traditional methods in 
accuracy. [14] tested a hybrid, deep CNN-based 
transfer learning model to diagnose childhood 
autism. Various transfer learning techniques were 
used to extract features for classifiers. The most 
accurate was ResNetl01V2 using SVM and 


ener 
5620 


Journal of Theoretical and Applied Information Technology 
31% May 2024. Vol.102. No. 10 


© Little Lion Scientific 


ISSN: 1992-8645 


www.jatit.org 


JATIT 


E-ISSN: 1817-3195 


Logistic Regression. The suggested multi-valued 
autism classification model worked well, possibly 
helping future research and therapeutic 
applications. [15] evaluated ML-based ASD 
diagnostic literature over the previous 5 years, 
establishing a taxonomy of the research 
environment and addressing key topics. It covered 
ML's classification process, MRI, representative 
studies, techniques, and biomarkers. 

Using the ABIDE dataset, ML algorithms 
were used to identify ASD in normal people [16]. 
The VM, LSTM, and CNN algorithms were 
examined. The best algorithm was CNN, with 95% 
accuracy. In [17], AI and DL screen children and 
adults for autism. Compared to classic and hybrid 
deep learning models, the proposed models proved 
superior. [18] used multiple feature selection 
methods on ASD datasets of toddlers, children, 
adolescents, and adults. After that, prediction 
accuracy, kappa statistics, the fl-measure, and 
AUROC were used to evaluate different classifiers 
on these datasets. Additionally, a non-parametric 
statistical significance test assessed classifier 
performance. The authors in [19] used real-world 
health claims data to predict ASD risk in 18- to 30- 
month-olds based on their medical history. Early 
diagnosis and intervention are essential for 
improving ASD children's long-term results, 
however, current screening techniques are 
inaccurate. In [20], deep learning was studied for 
ASD recognition. They found face features and a 
CNN effective for autism detection. Face 
recognition utilizing automated feature extraction 
and CNN classification might detect autism 
spectrum disorder, the study found. 
A novel dataset with 20 features was proposed for 
adult autism screening [21]. This dataset was 
expected to aid future studies in identifying autism's 
core components and classifying ASD patients. 
Behavioral studies suggest that the ten behavioral 
variables (AQ-10-Adult) and ten personality factors 
in this dataset might distinguish ASD patients from 
controls. [22] classified ASC patients using 
machine learning models based on face 
expressions, gaze behavior, head attitude, and 
speech attributes. The highest accuracy (74%) was 
attained with multimodal late fusion. In unimodal 
circumstances, face emotions (73%) and vocal 
characteristics (70%) worked well. We created an 
online SIT to gather different data for machine 
learning model construction, demonstrating 
machine learning's promise in clinical diagnosis. 
[23] predicted 12-36-month-old ASD with machine 
learning. 4-11, 12-17, and 18-year-olds were 
projected to have ASD. Advanced methodologies 


and technology were applied for ASD analysis, 
including Smart Autism, a smart device-based 
automated autism screening tool, and Genetic 
Variant Analysis of Boys with Autism. An ML- 
based approach was created to detect early ASD 
indications in youngsters [24]. SVM and RF 
algorithms helped the system categorize ASD data 
more accurately than previous techniques. In [25], 
computer vision tools were explored to assess ASD 
children's abilities and emotions during videotaped 
intervention sessions. Using 300 films, three deep 
learning-based vision models were created: activity 
comprehension, joint attention recognition, and 
emotion detection. On real-world footage, these 
models achieved 72.32%, 97%, 93.4%, and 95.1% 
accuracy. [26] divided ASD results into behavioral 
analysis, picture processing, and speech processing. 
The final section compared the efficiency of autism 
detection models or algorithms in each category. 

Despite the multitude of machine learning 
(ML) models developed for ASD detection, many 
fall short in providing robust and accurate 
diagnoses. Existing approaches often struggle with 
issues related to accuracy, scalability, and 
suitability for diverse clinical environments. 
However, this study introduces innovative 
techniques that integrate conventional ML 
algorithms with advanced ensemble methods, 
notably through a novel ensemble meta-features 
integration approach. By enhancing detection 
performance and offering scalability and 
adaptability, these novel approaches have the 
potential to revolutionize clinical decision-making 
in ASD diagnosis. 


2. METHOD 
The proposed framework for ASD detection is 
shown in Figure 1. In this work, data collection 
involved leveraging three distinct datasets sourced 
from the UCI Machine Learning Repository, 
representing different age groups: adults, 
adolescents, and children. The datasets served as 
the foundation for exploring the autism spectrum 
disorder (ASD) detection. After that, data 
preprocessing was done. In data preprocessing, 
missing values and irrelevant features are removed. 
Later, top feature analysis was conducted. To 
establish a baseline for comparison, conventional 
machine learning (ML) algorithms, including 
logistic regression, k-nearest neighbors, support 
vector machines, decision trees, random forests, 
AdaBoost, and gradient boosting, were applied. 

The performance of each model was evaluated 
using standard metrics such as accuracy, precision, 
recall, and Fl-score. Later, four ensemble 
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techniques were employed, including bagging 
meta-learner (BMA), stacked generalization, 
stacking classifier, and voting classifier. These 
techniques were applied individually, with 
predictions from each model combined using a 
meta-learner and a novel feature merging approach. 
The meta-learner was trained using predictions 
from individual ensemble models as input features, 
and the performance of the meta-ensemble model 
was evaluated using standard metrics. Experimental 
evaluation was performed to assess _ the 
effectiveness of each technique and the meta- 
ensemble approach. The performance of ensemble 
techniques was compared against individual ML 
models and the meta-ensemble model. 

Additionally, ASD classification was 
conducted based on three age groups using ML 
algorithms. This classification was performed to 
classify the samples into different age groups based 
on the input characteristics. In the discussion and 
conclusion sections, the experimental findings were 
discussed, highlighting the strengths and 
weaknesses of each technique and the novel feature 
merging approach. Insights into the potential 
implications of the proposed ensemble techniques 
and novel feature merging for ASD diagnosis were 
provided. 


2.1. ASD Data Collection 


The ASD dataset was gathered from the UCI 
repository. Three datasets, namely Autism-Adult- 
Data [27], Autism-Adolescent-Data [28], and 
Autism-Child-Data [29], were collected from UCI. 
Three datasets were utilized for autism screening, 
comprising 20 features, including ten behavioral 
traits (Q-Chat-10 [29]) and ten individual 
characteristics. The 10 behavioral questions are 
shown in Table 1. For questions Al-A9: If the 
response is "Sometimes," "Rarely," or "Never," a 
value of "1" is assigned. For question A10: If the 
response is "Always," "Usually," or "Sometimes," a 
value of "1" is assigned. In this way, the dataset has 
Al to A9 features with 0 and 1 values. The 
remaining features are collected through the 
responses of users through the app [31]. 


Figure1.Proposedmodel For ASD Detection 


Table 1. Details Of Questions Used For Extracting 
Behavioral Features 


Question 


A1-Does the individual respond when their name is 
called? 

A2-How easily does the individual make eye 
contact? 

A3-Does the individual point to indicate their wants 
or interests? 

A4-Does the individual point to share interests with 
others? 

A5-Does the individual engage in pretend play 
activities? 

A6-Does the individual follow others' gaze? 
A7-Does the individual show signs of wanting to 
comfort others when they are upset? 

A8-How would you describe the individual's first 
words? 

A9-Does the individual use simple gestures, such as 
waving goodbye? 

A10-Does the individual engage in repetitive 
staring behaviors without apparent purpose? 


These datasets offer valuable insights into 
enhancing ASD detection and identifying 
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influential autistic traits, addressing the scarcity of 
comprehensive ASD-related datasets available for 
analysis. The number of samples in each dataset is 
shown in Table 2. It shows total samples along with 
ASD types, yes and no. 


Table 2. Details Of Datasets Used 


Number Number of Total 
of samples samples 
samples with ASD 
Datanet wih as “NO” 
ASD as 
“yes” 
Autism- 189 515 704 
Adult-Data 
Autism- 41 104 
Adolescent- 
Data 5 
Autism- 141 152 292 
Child-Data 


2.2. Preprocessing 


Preprocessing was conducted on three datasets, 
each with unique data characteristics. In the first 
dataset (Autism-Adult-Data), instances of missing 
data were identified in the 'ethnicity' and 'relation' 
features, denoted by '?'. Additionally, some entries 
in the 'age' feature were left blank. To address these 
issues, a preprocessing step was implemented to 
handle missing values appropriately. The '?' entries 
in the ‘ethnicity! and 'relation' features were 
replaced with the most frequent values observed in 
their respective columns. Furthermore, the missing 
values in the 'age' feature were imputed using 
suitable methods such as mean or median 
imputation, ensuring the integrity of the 
dataset.Moving to the second dataset (Autism- 
Adolescent-Data), similar preprocessing steps were 
applied. Instances of missing data were observed in 
the 'ethnicity' and 'relation' features, again denoted 
by '?'. To rectify this, the same preprocessing 
techniques were employed, focusing on replacing 
the '?' entries with the most frequent values 
observed in their respective columns.Finally, in the 
third dataset (Autism-Child-Data), missing data 
was found in the 'relation' and 'ethnicity' features, 
marked by '?', while some entries in the 'age' feature 
were left blank. The preprocessing approach 
mirrored that of the previous datasets, with the '?' 
entries in the 'relation' and 'ethnicity' features 
replaced with their most frequent values, and 
missing values in the 'age' feature imputed using 
most frequent method. 


2.3. Conventional ML Algorithms 


To set an ASD detection baseline, many typical ML 
techniqueswere used. Logistic regression, a 
standard binary classification approach, was used to 
fit input characteristics to ASD probability. Simple 
yet effective, KNN classifies data items by their 
nearest neighbors' majority class. SVM found the 
best hyperplane to differentiate ASD from non- 
ASD occurrences in feature space, maximizing 
class margin. Recursively partitioning the data by 
feature values was done using decision trees, a 
common categorization tool. Multiple decision 
trees were used to create random forests to increase 
detection accuracy and resilience. AdaBoost, a 
boosting technique, trains weak learners on updated 
datasets to make them strong. Another boosting 
method, gradient boosting, stages poor learners and 
corrects their mistakes. Traditional ML algorithms 
were used to lay the groundwork for comparison 
with more sophisticated ensemble methods. 


2.4. Ensemble Learning Algorithms 


Advanced ensemble approaches are used with 
standard ML models to enhance Autism Spectrum 
Disorder (ASD) detection. Ensemble approaches 
use numerous models to provide a more accurate 
forecast. Bagging Meta-learner (BMA), Stacked 
Generalization, Stacking Classifier, and Voting 
Classifier are tested for improving ASD diagnostic 
accuracy and resilience. Ensemble approaches 
make use of base learners' variety and capacity to 
alleviate model shortcomings. Ensemble methods 
outperform individual models in accuracy, 
precision, recall, and Fl-score measures across all 
datasets. Stacked Generalization is an efficient 
ensemble strategy for integrating base learners and 
optimizing detection accuracy. Adding a meta- 
ensemble method that combines ensemble model 
predictions improves detection performance. This 
work uses ensemble techniques to promote ML- 
based ASD detection methods that may improve 
clinical decision-making. 

After applying ensemble models, the predicted 
results are merged, and a meta learner algorithm is 
applied for ASD prediction. 


2.5. Identification of best features for three 
datasets 


Feature extraction is a crucial step in machine 
learning, particularly in the context of autism 
spectrum disorder (ASD) research, where 
identifying the most informative attributes can lead 
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to a better understanding and prediction of the 
condition. Leveraging the Random Forest algorithm 
and one-hot encoding, the approach focused on 
discerning the key features across diverse ASD 
datasets representing various age groups.By 
training the Random Forest Classifier and assessing 
feature importance, significant contributors to ASD 
diagnosis were discerned, shedding light on the 
pivotal factors that influence the disorder's 
manifestation. The best ten features identified in the 
three datasets are shown in Table 3. 

From Table 3, it is observed that most of 
the best features are behavioral features only. For 
Autism-Adult-Data, top-performing features are 
those related to social interaction, such as the 
A9 Score, which evaluates an individual's use of 
simple gestures like waving goodbye. Additionally, 
A6 _ Score, assessing the individual's ability to 
follow others' gaze, and A7_Score, gauging signs of 
wanting to comfort others when upset, are 
significant indicators of social responsiveness and 
empathy. Features like A3_Score and A4 Score, 
which assess pointing behaviors to indicate wants 
or share interests, shed light on communication 
skills and joint attention abilities. These features 
provide valuable insights into the individual's 
capacity for reciprocal social interactions and 
communication, pivotal aspects in ASD diagnosis 
and intervention planning. Features such as 
A10. Score, focusing on repetitive staring behaviors 
without apparent purpose, and AI Score, 
evaluating responsiveness when their name is 
called, offer insights into repetitive behaviors and 
social responsiveness, respectively, providing a 
comprehensive understanding of an individual's 
behavioral profile. 

In the Autism Adolescent Data, pivotal 
features include A5 Score and A4 Score, 
indicating engagement in pretend play and sharing 
interests, respectively. Additionally, A10 Score 
and A6 Score assess repetitive behaviors and social 
interaction skills. Other significant features include 
A9 Score and A3 Score, evaluating gestures and 
pointing behaviors, and A8 Score and A7_ Score, 
highlighting communication and empathetic 
behavior. Age serves as a critical factor in 
understanding developmental trajectories. Together, 
these features provide a comprehensive insight into 
adolescent behavior in the context of autism 
spectrum disorder. 


Table 3. Best Features Including Behaviral Features 


Best Best features Best features 
S.No features in in Autism- in Autism- 
i Autism- Adolescent- Child-Data 
Adult-Data Data 
1 result result result 
2 A9 Score A5_ Score A4 Score 
3 A6_ Score A4 Score A9 Score 
4 AS5_Score A10_Score A10_Score 
5 A3_ Score A6_ Score A8_ Score 
6 A4 Score A9 Score Al_ Score 
7 A7_Score A3_Score A3_ Score 
8 A2 Score A8_ Score A6_ Score 
9 A10_ Score A7_Score A5_Score 
10 Al Score age A7 score 
In the Autism Child Data, significant 
features include A4 Score and A9 Score, 


reflecting the child's tendency to share interests and 
use gestures. A10 Score indicates repetitive 
behaviors, while A8_Score evaluates the child's 
first words. Al_Score and A3_Score highlight 


responsiveness and pointing behaviors, 
respectively, and A6_Score signifies the ability to 
follow others' gaze. A5_ Score indicates 


engagement in pretend play, and A7_Score assesses 
empathetic behavior. These features collectively 
provide valuable insights into child behavior in 
autism spectrum disorder. 

Next, best feature extraction process was 
performed excluding behavioral features. Table 4 
shows the best features, excluding behavioral 
features. In table 4, some values are blank. The 
reason for blanks is that categorical features have 
several options. All options for single categorical 
feature is considered as a single important feature. 
So, autism-adult data has 5 important features, 
autism-adolescent data has 6 important features; 
and autism-child data has 7 important features. The 
best features from Table 3 and Table 4 are used in 
experimentation. 


Table 4. Best Features Excluding Behaviral Features 


Best features Best features 
Best features 


S.N in Auti in Autism- in Autism- 

o mn AUSM- — Adolescent- Child-Data 
Adult-Data 
Data 
1 result result result 
2 age contry_of re contry_of re 
s s 

3 E ape oe 
4 ethnicity ethnicity ethnicity 
5 austim jaundice jaundice 
6 - gender austim 
7 - - gender 
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3. RESULTS AND DISCUSSION 
3.1. Applying conventional ML algorithms 


To establish a baseline for comparison, a range of 
conventional ML algorithmswere applied, including 
logistic regression, KNN, SVC, decision trees, 
random forests, AdaBoost, and gradient boosting. 
Each algorithm was evaluated based on standard 
performance metrics, including accuracy, precision, 
recall, and Fl-score. 80% and 20% splitting ratio is 
used for train and test parts in all experiments. 
Table 5 shows results with three datasets after 
applying conventional algorithms. 

Figure 2a shows the results of ML 
algorithms with autism-adult data. Each algorithm's 
precision, recall, Fl-Score, and accuracy are 


provided in Figure 2a. Precision measures the 
accuracy of positive predictions, with Logistic 
Regression leading with 95%, followed closely by 
SVC, KNN, Decision Tree, Random Forest, 
AdaBoost, and Gradient Boost. Recall, or 
sensitivity, indicates the model's ability to capture 
all positive instances, with decision tree achieving 
the highest at 90%. F1-Score, the harmonic mean of 
precision and recall, highlights Random Forest's 
balanced performance at 92.7%. Finally, accuracy 
reflects the overall correctness of predictions, with 
logistic regression boasting the highest at 97.3%. It 
I observed that all ML models given good results 
with autism-adult data. 


Table 5. Results With ML Algorithms 


Autism-Adolescent 


Autism- 


S.No Autism-Adult i 
child 
Precision 95 Precision 92 Precision 94 
bower Recall 93 Recall 94 Recall 93.8 
Fl 93.9 Fl 92.9 Fl 92.4 
Accuracy 97.3 Accuracy 96 Accuracy 97 
ae 94 Precision 93 Precision 93.5 
Precision 
SVC Recall 89 Recall 93 Recall 91 
Fl 91.4 Fl 93 Fl 92.2 
Accuracy 95 Accuracy 94.5 Accuracy 94 
ae 93.2 Precision 93 Precision 92.5 
Precision 
KNN Recall 89 Recall 93 Recall 92 
Fl 91.4 Fl 93 Fl 92.2 
Accuracy 94.6 Accuracy 94 Accuracy 94.6 
Precision 93.1 Precision 92.9 Precision 93 
Decision Tree Recall 90 Recall 92 Recall 91 
F1 91.5 F1 92.4 F1 91.9 
Accuracy 93.3 Accuracy 93 Accuracy 93.3 
Precision 93 Precision 92.7 Precision 94 
Random Forest Recall 92.5 Recall 93 Recall 91.5 
F1 92.7 F1 92.8 F1 92.7 
Accuracy 95.2 Accuracy 94.6 Accuracy 95.2 
Precision 92 Precision 91.8 Precision 93 
Adaboost Recall 90.4 Recall 92 Recall 92 
F1 93.2 F1 91.9 F1 92.4 
Accuracy 94.2 Accuracy 94 Accuracy 94.2 
Precision 92 Precision 93 Precision 93 
Gradient Boost Recall 91.2 Recall 92 Recall 93 
F1 91.2 F1 92.4 F1 93 
Accuracy 94.8 Accuracy 93 Accuracy 94.8 
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Autism-Adult-Data results with 
conventional ML algoritmhs 
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Figure 2.Results Of ML Algorithms With A) Autism- 
Adult-Data B)Autism-Adolescent-Data 


Figure 2b shows the results of ML 
algorithms with autism-adolescent data. Notably, 
logistic regression achieves a precision of 92% and 
a recall of 94%, resulting in an Fl-score of 92.9% 
and an accuracy of 96%. SVC demonstrates 
balanced performance across all metrics, with a 
precision, recall, and Fl-Score of 93% and an 
accuracy of 94.5%. The decision tree shown a 
slightly lower precision at 92.9% and an F1-Score 
of 92.4%, with an accuracy of 93%. The remaining 
models also performed well for detection. 

Figure 3 shows the results of ML 
algorithms with autism-child data. Logistic 
regression achieves a precision of 94% and a recall 
of 93.8%, resulting in an Fl-Score of 92.4% and an 
accuracy of 97%. SVM demonstrates a precision of 
93.5% and a recall of 91%, with an Fl-Score of 
92.2% and an accuracy of 94%. Gradient Boost 
shows the highest precision at 95% and recall at 
94%, resulting in an Fl-Score of 93% and an 
accuracy of 94.8%. Similarly, all the other applied 
models shown remarkable results with Autism- 
Child-Data for ASD detection. 


Autism-Child-Data results with 
conventional ML algoritmhs 


Applied Algorithm 


m Precision m Recall 
Figure3. Results Of ML Algorithms With Autism-Child- 
Data 


3.2. Applying Ensemble Learning algorithms 


Following the establishment of the baseline models, 
the effectiveness of ensemble techniques is 
explored to further enhance ASD detection 
performance. Four ensemble techniques namely 
bagging meta-learner (BMA), stacked 
generalization, stacking classifier, and voting 
classifierwere applied individually. The 
performance of each model evaluated using several 
metrices. 


3.3. Applying Bayesian Model Averaging 


Next, Bayesian Model Averaging (BMA) was 
applied as an ensemble method to enhance the 
detection of autism spectrum disorder (ASD). It 
begins by encoding categorical columns, and then 
the data is divided into train and test parts. 
Subsequently, instantiated conventional classifiers 
are used in Section 3.1. BMA is an ensemble 
technique that combines predictions from multiple 
models by considering their posterior probabilities 
based on observed data. The analysis of detection 
reports and accuracy scores underscored the 
efficacy of BMA in improving ASD detection. 


3.4. Applying Stacked Generalization with 
Neural Networks 


Next, stacked generalization with neural networks 
was applied to each dataset, encoding categorical 
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columns and splitting the data. Then, various 
classifiers and a meta-learner (a neural network) 
were instantiated. Using KFold cross-validation 
with five splits, stacked predictions and trained 
meta-learners are generated. Finally, the 
performance using ASD detection reports and 
accuracy scores is accessed, demonstrating the 
effectiveness of stacked generalization with neural 
networks in improving ASD detection across all 
datasets. 


3.5. Applying Voting Classifier 


Next, a voting classifier was employed to enhance 
ASD detection across different datasets. Initially, 
categorical columns were encoded, and features and 
targets were selected from the dataset. Base models 
including Logistic Regression, KNN, SVC, 
Decision Tree, Random Forest, AdaBoost, and 
Gradient Boosting were instantiated for the voting 
classifier. 


3.6. Applying Stacking Classifier 
Later, a stacking classifier was applied to improve 


the detection of ASD. Initially, categorical columns 
were encoded, and features and targets were 


selected. Base models, including Logistic 
Regression, KKNN, SVM, Decision Tree, Random 
Forest, AdaBoost, and Gradient Boosting, were 
instantiated for the Stacking Classifier. The 
Stacking Classifier was trained and evaluated using 
a cross-validation strategy with five splits, 
generating predictions for each fold. The result 
report for the Stacking Classifier on each dataset 
was analyzed, along with mean precision, recall, 
and F1-score. 


3.7. Results with ensemble models 


The results with ensemble models are shown in 
Table 6. Figure 4a outlines the performance metrics 
of various ML algorithms with autism-adult data. 
The BMA algorithm displayed remarkable 
precision at 99%, accompanied by a recall of 97% 
and an F1-Score of 98%, resulting in an accuracy of 
98.5%. Stacked generalization achieved a precision 
of 98%, with a recall of 97% and an Fl-Score of 
97.4%, yielding an accuracy of 98.2%. The Voting 
Classifier showcased a high recall of 98.5% and 
attained an accuracy of 98%, while the Stacking 
Classifier boasted a precision of 98% and a recall of 
99%, culminating in an Fl-Score of 98.5% and an 
accuracy of 99%. 


Table 6. Results With Ensemble Algorithms 


Autism-Adolescent- | Autism-Child-Data 


S.No Autism-Adult-Data 
Data 
Precision 99 Precision 99 Precision 99 
Bayesian Model Recall 97 Recall 98 Recall 97 
Averaging Fl 98 F1 98.5 F1 98 
Accuracy 98.5 Accuracy 98.6 Accuracy 98 
Precision 98 Precision 98 Precision 98 
Stacked Generalization Recall 97 Recall 97.8 Recall 98 
F1 97.4 Fl 97.7 F1 98 
Accuracy 98.2 Accuracy 98 Accuracy 98 
Vong Classifier Precision 97 Precision 96 Precision 98 
Recall 98.5 Recall 97 Recall 97 
F1 96.7 FI 96.5 F1 97.5 
Accuracy 98 Accuracy 96 Accuracy 98.2 
Precision 98 Precision 96 Precision 97 
Stacking Classifier Recall 99 Recall 98 Recall 96 
F1 98.5 Fl 97 F1 96.5 
Accuracy 99 Accuracy 97 Accuracy 97.5 
stacked generalization algorithm achieved a 


Figure 4b shows ensemble model performance with 
Autism-Adolescent-Data. The BMA algorithm 
demonstrated exceptional precision at 99%, along 
with a recall of 98%, resulting in an Fl-Score of 
98.5% and an accuracy of 98.6%. Similarly, the 


precision of 98% with a recall of 97.8%, yielding 
an Fl-Score of 97.7% and an accuracy of 98%. The 
Voting Classifier and Stacking Classifier 
algorithms showed slightly lower performance 
metrics, with precision values of 96% and recall 
values of 97% and 98%, respectively, resulting in 
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corresponding Fl-Scores of 96.5% and 97% and 
accuracies of 96% and 97%, respectively. 


Autism-Adult-Data results with ensemble 
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Autism-Adolescent-Data results with 
ensemble algoritmhs 
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Figure 4. Results Of Ensemble Algorithms With A) 
Autism-Adult-Data B)Autism-Adolescent-Data 
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Figure5. Results Of Ensemble Algorithms With Autism- 
Child-Data 


Figure 5 shows ensemble model performance with 
autism-child data. The BMA algorithm achieves 
exceptional precision at 99% and a recall of 97%, 
resulting in an Fl-Score and accuracy of 98%. 
Similarly, the stacked generalization algorithm 
demonstrates strong precision and recall, both at 
98%, leading to an Fl-score and accuracy of 98%. 
The voting classifier achieves a precision of 98%, a 
recall of 97%, and an Fl-score of 97.5%, resulting 
in an accuracy of 98.2%. Lastly, the Stacking 
Classifier yields a precision of 97% and a recall of 
96%, resulting in an Fl-Score of 96.5% and an 
accuracy of 97.5%. 


3.8. Applying Novel Feature Merging Approach 
with Meta-Learner 


In the next step, novel approach was introduced to 
merge features derived from individual ensemble 
models. This involves combining predictions from 
individual ensemble models using a meta-learner 
and integrating novel feature merging techniques. 
Random forest is used as a meta-learner in this 
approach. The meta-learner was trained using 
predictions from individual ensemble models as 
input features, and the performance of the resulting 
meta-ensemble model was evaluated using standard 
metrics. The results of Feature Merging Approach 
with Meta-Learner are shown in Table 7 and figure 
6. 


Table 7. Results With Novel Feature Merging Approach 
With Meta-Learner 


Autem- Autism- Autism- 
S.No one Adolescent — Child-Data 
Adult-Data 
-Data 
Novel  Precisi 99 Precisi 9 Precisi 10 
Featur on on 9 on 0 
e 98 9 
Mergi Recall 5 Recall 9 Recall 98 
ng 98 9 
Rone Fl 7 Fl 9 Fl 99 
ach 99 98. 
with 
Meta- Accur 99. — es 
Learn 2°Y 2 cy cy 
er 


The Novel Feature Merging Approach with Meta- 
Learner exhibited exceptional performance across 
various datasets related to autism spectrum disorder 
(ASD) detection. In the Autism-Adult Data set, the 
approach achieved impressive precision, recall, F1- 
Score, and accuracy values of 99%, 98.5%, 98.7%, 
and 99.2%, respectively. Similarly, in the Autism- 
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Adolescent Data set, the method maintained 
consistent high performance with precision, recall, 
F1-Score, and accuracy all reaching 99%. 


Results with Novel Feature Merging 


100 A i ach with Meta-Learner 
f f f 
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oO 
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wn 


Recall F1-Score Accuracy 
Performance Measure 


E Autism-Adult-Data 


Figure6.Performance Of Feature Merging Approach 
With Meta-Learner 


Notably, in the Autism-Child-Data set, the 
precision stood out as perfect at 100%, 
accompanied by a recall of 98%, an Fl-Score of 
99%, and an accuracy of 98.7%. These results 
underscored the efficacy of the Novel Feature 
Merging Approach with Meta-Learner in ASD 
detection across various age groups, showcasing its 
potential for practical implementation in clinical 
settings to aid in accurate and timely diagnosis. 


3.9. Proposed method comparison with other 
models 


Figure 7 shows the proposed method's accuracy 
comparison with other heisting models. Existing 
works have reported accuracies ranging from 75% 
to 98%. Conventional ML approaches have 
achieved accuracies of 75% [4] and 95% [22], 
while ML algorithms have shown an accuracy of 
75% [8]. Additionally, conventional deep learning 
methods have reached an accuracy of 95% [16]. 
Moreover, computer vision techniques have 
demonstrated promising results with an accuracy of 
95% [25]. In comparison, the proposed method 
surpasses these existing approaches, achieving an 
impressive accuracy of 99%. This indicates the 
effectiveness and superiority of the approach to 
ASD detection. 


Conventional ML [4] 
ML Algorithms [8] 
Conventional DL [16] 
Conventional ML [22] 
Computer Vision [25] 
Proposed method 


meme] 


Accuracy Achieved 


Method Used 


Figure 7.Proposed Method Comparison With Other 
Models 


This paper introduced novel contributions to 
Autism Spectrum Disorder (ASD) detection by 
integrating conventional machine learning (ML) 
algorithms with advanced ensemble techniques 
across diverse age groups. Unlike prior research, 
which often focused on individual algorithms or 
specific age cohorts, this approach utilized three 


datasets representing adults, adolescents, and 
children. A novel ensemble meta-features 
integration technique enhanced detection 


performance, achieving higher accuracy. While 
offering advantages such as improved performance 
and comprehensive analysis, challenges included 
the complexity of ensemble techniques and the 
need for further optimization. Future research 
directions included refining ensemble techniques, 
integrating additional data modalities, conducting 
longitudinal studies, and prioritizing transparent 
and ethical AI solutions for ASD detection. 


3.10. ASD Classification based on age groups 


Building upon the ASD detection step, the analysis 
was extended to classify ASD cases into age- 
specific categories. By combining features from 
three separate datasets, namely Autism-Adult-Data, 
Autism-Adolescent-Data, and Autism-Child-Data,a 
unified dataset is constructed to predict ASD 
classification based on a range of demographic and 


diagnostic variables. These variables included 
scores from 'Al_Score' to 'A10_Score', 
demographic factors like 'age', 'gender', and 


'ethnicity', as well as diagnostic indicators such as 
‘jundice' and 'austim'. Here, 'age_desc' is designated 
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as the target variable, categorizing it as "18 and 
more" for adults, "12—16 years" for adolescents, 
and "4-11 years" for children. Utilizing a diverse 
array of ML algorithms, including logistic 
regression, KNN, SVM, DT, RF, AdaBoost, and 
gradient boosting, we achieved exceptional 
accuracy, exceeding 97% across all models. This 
comprehensive approach to ASD age specific 
classification underscores the efficacy of ML 
techniques in accurately classifying ASD across 
diverse age cohorts. 


4. CONCLUSION 


This paper proposed novel techniques for the 
detection of autism spectrum disorder (ASD) using 
a combination of conventional ML algorithms and 
advanced ensemble techniques. Employing three 
distinct age group datasetsadults, adolescents, and 
children, novel strategies were introduced to 
enhance ASD diagnosis accuracy. Through data 
preprocessing and analysis of top features, 
discriminative features for ASD detection were 
identified. The initial application of conventional 


ML algorithms established a baseline for 
comparison, followed by an exploration of 
ensemble techniques' effectiveness. The 


experimental findings consistently demonstrated 
that ensemble techniques outperformed individual 
models across all datasets, achieving higher 
accuracy, precision, recall, and F1-score. Moreover, 
the introduction of a novel ensemble meta-feature 
integration technique further enhanced 
performance. In addition to enhancing diagnostic 
accuracy, the implementation of these techniques 
lays a foundation for potential real-time ASD 
diagnosis, facilitating timely intervention and 
support. With the highest accuracy achieved in 
autism-adult data (99.2%), autism-adolescent data 
(99%), and autism-child data (98.7%), this research 
significantly advanced ML-based approaches for 
ASD diagnosis. Additionally, ASD classification 
also performed across various age groups and 
reported good results. These novel techniques have 
the potential to enhance clinical decision-making in 
ASD diagnosis, marking a significant step forward 
in addressing the challenges posed by the ASD 
neurodevelopmental condition. 
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